A redundancy-removing feature selection algorithm for nominal data

نویسنده

  • Zhihua Li
چکیده

Computations to obtain the mutual information that is in nominal data are usually difficult, and there will always be more redundancy in a nominal dataset, which means that an efficient nominal-data feature selection method is difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm

This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a  structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the  measure...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter

An algorithm for filtering information based on the Pearson χ test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. This algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PeerJ Computer Science

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2015